특정 작업용 AI에서 일반 목적 대규모 언어 모델로의 전환

인공지능의 패러다임 전환

1. 특정성에서 일반성으로

AI 분야는 모델의 학습과 배포 방식에 걸쳐 막대한 변화를 겪었습니다.

구형 패러다임 (특정 작업용 학습):초기 CNN이나 BERT와 같은 모델은 하나의 특정 목표(예: 감성 분석)에만 맞춰 학습되었습니다. 번역이나 요약 등의 다른 작업을 위해서는 별도의 모델이 필요했습니다.
신형 패러다임 (중앙 집중형 사전 학습 + 프롬프트):거대한 하나의 모델(대규모 언어 모델, LLM)이 인터넷 규모의 데이터셋에서 일반적인 세계 지식을 학습합니다. 이후 입력 프롬프트만 바꾸면 거의 모든 언어적 작업을 수행할 수 있도록 안내할 수 있습니다.

2. 아키텍처의 진화

엔코더 중심 (BERT 시대):이해 및 분류에 초점을 맞추고 있습니다. 이러한 모델들은 텍스트를 양방향으로 읽어 깊은 맥락을 파악하지만, 새로운 텍스트 생성을 위해 설계된 것은 아닙니다.
디코더 중심 (GPT/라마 시대):생성형 AI의 현대적 기준입니다. 이러한 모델들은 다음 단어를 예측하기 위해 자동 회귀 모델링을 사용하며, 개방형 생성과 대화에 이상적입니다.

3. 변화의 주요 원인

자기 감독 학습:수많은 레이블이 없는 인터넷 데이터를 활용한 학습으로, 인간의 레이블링이라는 제약을 제거합니다.
확장 법칙:AI 성능이 모델 크기(매개변수), 데이터 부피, 그리고 계산 능력과 함께 예측 가능한 방식으로 증가한다는 경험적 관찰입니다.

핵심 통찰

AI는 '특정 작업 도구'에서 '일반 목적 에이전트'로 진화했으며, 추론과 컨텍스트 내 학습 같은 유출 능력을 보입니다.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Question 1

What is the primary difference between the "Old Paradigm" and the "New Paradigm" of AI?

Moving from cloud computing to local processing.

Moving from task-specific training to centralized pre-training with prompting.

Moving from Python to C++ for model development.

Moving from Decoder-only to Encoder-only architectures.

Question 2

According to Scaling Laws, what three factors fundamentally link to model performance?

Internet speed, RAM size, and CPU cores.

Human annotators, code efficiency, and server location.

Model size (parameters), data volume (tokens), and total computation.

Prompt length, temperature setting, and top-k value.

Challenge: Evaluating Architectural Fitness

Apply your knowledge of model architectures to real-world scenarios.

You are an AI architect tasked with selecting the right foundational approach for two different projects. You must choose between an Encoder-only (like BERT) or a Decoder-only (like GPT) architecture.

Task 1

You are building a system that only needs to classify incoming emails as "Spam" or "Not Spam" based on the entire context of the message. Which architecture is more efficient for this narrow task?

Solution: Encoder-only (e.g., BERT)

Because the task is classification and requires deep, bidirectional understanding of the text without needing to generate new text, an Encoder-only model is highly efficient and appropriate.

Task 2

You are building a creative writing assistant that helps authors brainstorm ideas and write the next paragraph of their story. Which architecture is the modern standard for this?

Solution: Decoder-only (e.g., GPT/Llama)

This task requires open-ended text generation. Decoder-only models are designed specifically for auto-regressive next-token prediction, making them the standard for generative AI applications.